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What is Regression? 


Regression is a method for understanding the 


relationship between independent variables or 
features and a dependent variable or outcome. 


Outcomes can then be predicted once the 
relationship between independent and 
dependent variables has been estimated 


For example, You want to build a regression 
model to predict the hourly wages of a 
worker using variables like education level, 
gender, experience, skillset, etc. 
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Types of Regression 
Analysis 
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Simple Linear 
Regression 


It is a type of regression that is used to model the 
relationship between a dependent variable and a 
single independent variable. E.g. The usage could 
be to predict an employee's salary based on 
his/her qualification. 
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Multiple Linear 
Regression 


It is a type of regression that is used to model the 
relationship between a dependent variable and 
two or more independent variables. E.g. The 
usage could be to predict an employee's salary 
based on his/her qualification, experience, city. 


ir DATARANCH.. 


Polynomial Regression 


It is a type of regression that models the 
relationship between a dependent variable and 
an independent variable by fitting a polynomial 
equation to the data. E.g. It is widely applied to 
predict the spread rate of COVID-19 and other 
infectious diseases. 
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Logistic Regression 
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It is a type of regression used to model the 
relationship between a dependent variable and 
one or more independent variables when the 
dependent variable is binary. E.g. Deciding 
whether to admit a student in class or not based 
on his/her grade, skill set, and experience. 
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Other Types of f 
Regression 
000 


Ridge Regression: It is a type of regression that is 
used to prevent overfitting in multiple linear 
regression by adding a penalty term to the cost 
function. 


Lasso Regression: It is a type of regression that is 
used to perform feature selection in multiple 
linear regression by adding a penalty term to the 
cost function that encourages the coefficients of 
some of the independent variables to be exactly 
zero. 
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Other Types of 
Regression 
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ElasticNet Regression: It is a type of regression analysis 
that combines the L1 and L2 regularization methods of 
Lasso and Ridge regression, respectively. It is used for linear 
regression problems where the number of independent 
variables is greater than the number of observations or 
when the independent variables are highly correlated with 
each other. 


Bayesian Regression: In Bayesian Regression, the prior 
distribution of the parameters is specified before the data is 
observed. This prior distribution represents the researcher's 
belief about the parameters before any data is collected. 
Once the data is observed, the prior distribution is updated 
to become the posterior distribution, which represents the 
researcher's belief about the parameters after observing the 
data. 
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Usage of Regression 
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1. Forecasting continuous outcomes like house 
prices, stock prices, or sales. 


2. Predict the success of future retail sales or 
marketing campaigns to ensure resources are 
used 

effectivelv. 


3. Predict customer or user trends, such as on e- 


commerce websites. 
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Challenges 


Overfitting: Overfitting occurs when the model is too 
complex and fits the training data too closely, resulting in 
poor performance on new or unseen data. It can be 
addressed using regularization techniques like L1 and L2 
regularization or early stopping. 


Underfitting: Underfitting occurs when the model is too 
simple and fails to capture the underlying patterns in the 
data. It can be addressed by increasing the complexity of 
the model or by adding more relevant features. 


Multicollinearity: Multicollinearity occurs when two or 
more independent variables are highly correlated with 
each other. This can make it difficult to estimate the 
individual effects of each variable and can lead to 
unstable parameter estimates. 
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Challenges 


Outliers: Outliers are data points that are significantly 
different from the rest of the data. They can have a large 
influence on the regression model and can lead to 
inaccurate estimates of the parameters 


Non-linearity: Regression models assume a linear 
relationship between the independent and dependent 
variables. However, in some cases, the relationship may 
be non-linear, and this can lead to inaccurate predictions. 


Missing data: Missing data can make it difficult to 
estimate the parameters of the regression model and can 
lead to biased estimates if not handled properly. 
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Follow #DataRanch on 
Linkedin for more... 
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Wrangling 
Steps 


Common data ħ 
fallacies to 
watch out for... 
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Follow #DataRanch on 
LinkedIn for more... 


What is What is 
Supervised Unsupervised 
Learning? Learning? 
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z . e t-Distributed 
Clusteri ng Principal Stochastic Neighbour 


Component Embedding 
Analysis ane 
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PE info@dataranch.org 


in linkedin.com/company/dataranch 


